{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PyTorch自动求梯度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PyTorch提供[autograd](https://pytorch.org/docs/stable/autograd.html)包能够根据input和forward propagation过程自动构建计算图，并执行反向传播。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tensor Variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Variable是torch.autograd中的数据类型，主要用于封装tensor，进行自动求导。torch.autograd.Variable涉及内容：data, grad, grad_fn, requires_grad, is_leaf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> data: Tensor(被封装)  \n",
    "> grad: data的梯度  \n",
    "> grad_fn: 梯度函数，创建Tensor的Function，用于自动求导  \n",
    "> requires_grad: 是否需要梯度  \n",
    "> is_leaf: 判断是否是叶节点"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = torch.randn(3,5,3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[[ 0.3934,  1.1281, -0.6747],\n",
      "         [ 2.1436,  0.0740, -1.1003],\n",
      "         [-1.2322,  0.7602,  0.8985],\n",
      "         [-0.4370, -1.1841, -0.7881],\n",
      "         [ 0.7882,  0.5963,  1.5099]],\n",
      "\n",
      "        [[-0.0318,  1.2057, -0.8379],\n",
      "         [ 1.1314,  0.6484,  0.9687],\n",
      "         [-0.0402,  0.4431,  0.4285],\n",
      "         [-3.3623, -0.8627,  0.7036],\n",
      "         [-0.1810, -0.0880, -1.0115]],\n",
      "\n",
      "        [[ 0.9452,  0.1026,  0.4277],\n",
      "         [-0.2794, -1.6374,  1.1138],\n",
      "         [-1.9202, -0.8941,  0.8856],\n",
      "         [-0.5415,  0.7361, -0.3806],\n",
      "         [ 1.0723,  1.1546,  1.7948]]])\n"
     ]
    }
   ],
   "source": [
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([3, 5, 3])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.size()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([3, 5, 3])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "device(type='cpu')"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.device"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Funtion类"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "tensor和function结合构建有向无环图(Directed Acyclic Graph, DAG)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = torch.ones(2, 2, requires_grad=True) #人工直接创建，同时梯度跟踪开启，但自身没有grad_fn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[1., 1.],\n",
      "        [1., 1.]], requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "print(x)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "print(x.grad_fn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = torch.add(x,2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[3., 3.],\n",
      "        [3., 3.]], grad_fn=<AddBackward0>)\n"
     ]
    }
   ],
   "source": [
    "print(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<AddBackward0 object at 0x00000252D0187AC0>\n"
     ]
    }
   ],
   "source": [
    "print(y.grad_fn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = torch.ones(2,2)  #默认requires_grad = False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "None\n"
     ]
    }
   ],
   "source": [
    "print(z.grad_fn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True False\n"
     ]
    }
   ],
   "source": [
    "print(x.is_leaf, y.is_leaf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "print(z.is_leaf) #初始创建均是叶节点"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = y*y*3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[27., 27.],\n",
      "        [27., 27.]], grad_fn=<MulBackward0>)\n"
     ]
    }
   ],
   "source": [
    "print(z)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "out = z.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[27., 27.],\n",
      "        [27., 27.]], grad_fn=<MulBackward0>)\n"
     ]
    }
   ],
   "source": [
    "print(z)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(27., grad_fn=<MeanBackward0>)\n"
     ]
    }
   ],
   "source": [
    "print(out)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "out.requires_grad"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<MeanBackward0 at 0x252d076c2e0>"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "out.grad_fn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## autograd 自动求梯度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. torch.autograd.backward "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "torch.autograd.backward(tensors,\n",
    "                        grad_tensors=None,\n",
    "                        retain_grad=None,\n",
    "                        create_graph=False)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 功能： 自动求梯度  \n",
    "> tensors: 用于求导的张量  \n",
    "> retain_graph: 保存计算图；由于pytorch采用动态图机制，在每一次反向传播结束之后，计算图都会释放掉。如果想继续使用计算图，就需要设置参数retain_graph为True\n",
    "> create_graph: 创建导数计算图，用于高阶求导，例如二阶导数、三阶导数等\n",
    "> grad_tensors: 多梯度权重；当有多个loss需要去计算梯度的时候，设计各个loss之间的权重比例。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. torch.autograd.grad"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "torch.autograd.grad(outputs, \n",
    "                    inputs,  \n",
    "                    grad_outputs=None,  \n",
    "                    retain_graph=None,  \n",
    "                    create_graph=False)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<torch._C.Generator at 0x252c8afd9f0>"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "torch.manual_seed(10)  #用于设置随机数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "w = torch.tensor([1.], requires_grad=True)  #创建叶子张量，并设定requires_grad为True\n",
    "x = torch.tensor([2.], requires_grad=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = torch.add(w, x)   #执行运算并搭建动态计算图\n",
    "b = torch.add(w, 1)\n",
    "y = torch.mul(a, b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "y.backward(retain_graph=True) #y开始反向传播"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([10.])\n"
     ]
    }
   ],
   "source": [
    "print(w.grad)   #y对w求导"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([6.], grad_fn=<MulBackward0>)"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<bound method Tensor.backward of tensor([6.], grad_fn=<MulBackward0>)>"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y.backward"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 理解：Tensor.backward()中当tensor不是标量的时候，需要一个与tensor同等形状的入参tensor。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x: tensor([2., 3., 4., 5.], requires_grad=True)\n",
      "\n",
      "y: tensor([0., 0., 0.])\n"
     ]
    }
   ],
   "source": [
    "x = torch.tensor([2.,3.,4.,5.], requires_grad=True)\n",
    "y = torch.zeros(3)\n",
    "print('x: %s\\n' %x)\n",
    "print('y: %s' %y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "y: tensor([ 6., 12., 20.], grad_fn=<CopySlices>)\n"
     ]
    }
   ],
   "source": [
    "y[0] = x[0]*x[1]\n",
    "y[1] = x[1]*x[2]\n",
    "y[2] = x[2]*x[3]\n",
    "print('y: %s' %y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {},
   "outputs": [],
   "source": [
    "y.backward(torch.ones(y.shape), retain_graph=False) \n",
    "# retain_graph=False 只能进行一次反向传播\n",
    "# retain_graph=True 每次BP的梯度都会保留，并且递加新的梯度"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([3., 6., 8., 4.])"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.grad"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在PyTorch的backward模块里，如果tensor是一个vector不是scalar，需要入参里添加要给和tensor（这里用y表示，$y_0$, $y_1$, $y_2$）相同形状的vector（这里使用w表示, $w_0$, $w_1$, $w_2$）。针对这一现象，做出一些解释。\n",
    ">解释如下：  \n",
    "![](https://upload-images.jianshu.io/upload_images/8570704-86ba34bcadd2f23a.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如上图所示，加入同形状的$w$使得计算的逻辑统一起来，要不然$y_0$,$y_1$,$y_2$的BP梯度需要分开单独求，计算上会复杂，速度也会受限。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
